Step 1: Pseudocode

LaTeX pseudo located here. "English" pseudo located here (this is before I knew how to use LaTeX

Also, will need for implementation of kernel probability density funcitons: KDE Python Notebook (code.md approved)

As of 11/26/2016:

Adding Kurtosis Python Notebook as well as Spectral Analysis

Both pseudocodes can be found in the updated LaTeX in the file above.


Step 2: Simulations & Details of Their Parameters

For all sine waves created: make sure number of points = 1000 sampled from each wave representing data.

Success:

  • 50 sine waves, all 50 the same wave
    • Should result in none selected
  • 50 sine waves, all 50 the same wave, 1 with lots of white noise
    • Should result in white noise one selected
  • 50 sine waves, all same frequency but different amplitudes, 1 with white noise
    • Should result in white noise one selected

Hopeful:

  • 40 sine waves, 32 with slight noise, 8 with heavy noise CLOSEST TO REAL DATA

Fail:

  • 50 sine waves, all same frequency but different amplitudes, 40 with white noise
    • No idea result; either none selected or multiple

Step 3: Choose Visualization

  1. Will use connected scatter plots to show the simulated data (sine waves).
  2. Then, after finding the joint probabilities across each of the sine waves (i.e. the electrode), will plot the joint probabilities as a distribution via KDE
  3. Finally, comparing the number of standard of deviations from the mean to a threshold, find out if data remains or not. Print which waves stayed, and plot them (not sure if really necessary though).

Step 4: Choose Metrics for Evaluating Performance

Basically if we're able to select the poor waves or not and the final dataset does not have the really poor waves.


Step 5: Generating Simulated Data

Start by generating each set of data from above.

In [3]:
import numpy as np
# Fix random seed
initseed = 123456789
np.random.seed(initseed)
In [4]:
numvals = 1000
# First, build the relevant linspace to grab 1000 points
times = np.linspace(0, 1, numvals)
# Then define the general sine wave used throughout
sin = np.sin(2 * np.pi * times)
# Define function for white noise
def gen_whitenoise(mean, std, size):
    retval = np.random.normal(mean, std, size=size)
    return retval
In [5]:
# Success Case 1 Data
# 50 of the same sine waves
success1 = np.column_stack([sin] * 50)
#print success1.shape
In [6]:
# Success Case 2 Data
# 50 same sine waves, 1 with white noise
success2 = np.column_stack([sin] * 50)
wn = gen_whitenoise(0, 10, numvals)
success2[:, 4] = success2[:, 4] + wn.T
#print success2[0]
In [7]:
# Success Case 3 Data
# 50 different amplitude sine waves, 1 with white noise
success3 = np.column_stack([sin] * 10 +
                           [sin * 2] * 10 +
                           [sin * 3] * 10 +
                           [sin * 4] * 10 +
                           [sin * 5] * 10)
success3[:, 49] = success3[:, 49] + wn.T
#print success3[0]
#print success3[1]
In [8]:
# Hopeful Case 1 Data
# 40 sine waves, 32 each with small amounts of white noise, 8 with a lot
hope1 = np.column_stack([sin] * 40)
for i in range(0, 32):
    hope1[:, i] += gen_whitenoise(0, 0.25, numvals)
for i in range (32, 40):
    hope1[:, i] += gen_whitenoise(0, 20, numvals)
    
#print hope1[:, 0]
In [9]:
# Fail Case 1 Data
# 50 same sine waves, 40 with white noise
fail1 = np.column_stack([sin] * 50)
for i in range(0, 40):
    fail1[:, i] = fail1[:, i] + gen_whitenoise(0, 2, numvals)
    
#print fail1[0]

Step 6: Plot Simulated Data

In [10]:
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot, plot
from plotly.graph_objs import *
from plotly import tools
init_notebook_mode()
In [11]:
# Success 1

# Setup plotly data
datasets =[]

for i in range(0, 5):
    datasets.append(Scatter(
        x = times,
        y = success1[:,i],
        name = 'Sample ' + str(i)
    ))

data = [datasets[:]]

# Setup layout

layout = dict(title = 'Success 1',
              xaxis = dict(title = 'Time'),
              yaxis = dict(title = 'Unit'),
              )

# Make figure object

fig = dict(data=datasets, layout=layout)

iplot(fig)
In [12]:
# Success 2

# Setup plotly data
datasets =[]

for i in range(0, 5):
    datasets.append(Scatter(
        x = times,
        y = success2[:,i],
        name = 'Sample ' + str(i)
    ))

data = [datasets[:]]

# Setup layout

layout = dict(title = 'Success 2',
              xaxis = dict(title = 'Time'),
              yaxis = dict(title = 'Unit'),
              )

# Make figure object

fig = dict(data=datasets, layout=layout)

iplot(fig)
In [13]:
# Success 3

# Setup plotly data
datasets =[]

for i in range(0, success3.shape[1]):
    datasets.append(Scatter(
        x = times,
        y = success3[:,i],
        name = 'Sample ' + str(i)
    ))

data = [datasets[:]]

# Setup layout

layout = dict(title = 'Success 3',
              xaxis = dict(title = 'Time'),
              yaxis = dict(title = 'Unit'),
              )

# Make figure object

fig = dict(data=datasets, layout=layout)

iplot(fig)
In [14]:
# Hopeful 1

# Setup plotly data
datasets =[]

for i in range(0, 10):
    datasets.append(Scatter(
        x = times,
        y = hope1[:,i],
        name = 'Sample ' + str(i)
    ))

data = [datasets[:]]

# Setup layout

layout = dict(title = 'Hope 1',
              xaxis = dict(title = 'Time'),
              yaxis = dict(title = 'Unit'),
              )

# Make figure object

fig = dict(data=datasets, layout=layout)

iplot(fig)
In [15]:
# Fail 1

# Setup plotly data
datasets =[]

for i in range(0, 5):
    datasets.append(Scatter(
        x = times,
        y = fail1[:,i],
        name = 'Sample ' + str(i)
    ))

data = [datasets[:]]

# Setup layout

layout = dict(title = 'Fail 1',
              xaxis = dict(title = 'Time'),
              yaxis = dict(title = 'Unit'),
              )

# Make figure object

fig = dict(data=datasets, layout=layout)

iplot(fig)

Step 7: Write Algorithm Code

Go step by step.

Reshape Data!

In [16]:
# Reshape Data: Run with no reshaping

electrodes = 0
trials = 0
times = 0

print len(success1.shape)

def reshape(inEEG):
    if len(inEEG.shape) == 3:
        electrodes = inEEG.shape[1]
        times = inEEG.shape[0]
        trials = inEEG.shape[2]
        return np.reshape(inEEG, (inEEG.shape[0] * inEEG.shape[2], inEEG.shape[1]))
    elif len(inEEG.shape) != 1 and len(inEEG.shape) != 2:
        # fail case
        print "fail"
    else:
        return inEEG
    
print "Final dimensions:", reshape(success1).shape
2
Final dimensions: (1000L, 50L)
In [17]:
# Reshape Data: Run with reshaping

success1 = np.expand_dims(success1, 2)
print len(success1.shape)
print success1.shape
print "Final dimensions:", reshape(success1).shape
success1 = reshape(success1)
3
(1000L, 50L, 1L)
Final dimensions: (1000L, 50L)
In [18]:
# Reshape Data: run with data where reshaping actually matters
dummy = np.dstack([success1] * 2)
print dummy.shape
print dummy[:,:,1].shape
print "Final dimensions:", reshape(dummy).shape
(1000L, 50L, 2L)
(1000L, 50L)
Final dimensions: (2000L, 50L)

Kernel density probability dist

In [19]:
# Now define the probability density functions tested in other notebooks
from sklearn.neighbors.kde import KernelDensity
from sklearn.grid_search import GridSearchCV
# Run the KDE!
def kdewrap(indata, kernel):
    grid = GridSearchCV(KernelDensity(),
                    {'bandwidth': np.linspace(0.1, 1.0, 30)},
                    cv=10) # 10-fold cross-validation
    grid.fit(indata[:, None])
    kde = KernelDensity(kernel=kernel, bandwidth=grid.best_params_["bandwidth"]).fit(indata[:, np.newaxis])
    return kde.score_samples(indata[:, np.newaxis])

Define what's inside loop the KDE will execute in

In [20]:
regelec = success1[:, 0]
#print regelec
probdist = kdewrap(regelec, 'gaussian')
#print probdist
# get joint prob with assumption in log-likelihood format
# leave in log-likelihood to prevent underflow
jointprob = np.sum(probdist)
print jointprob
-888.71409723

These are really all the modular testable parts of the code. Now will combine into algorithm + ending section

Algorithm Definition:

Probability Bad Electrode Detection

In [21]:
def prob_baddetec(inEEG, threshold, probfunc):
    electrodes = inEEG.shape[1]
    
    # Start by reshaping data (if necessary)
    if len(inEEG.shape) == 3:
        inEEG = np.reshape(inEEG, (inEEG.shape[0] * inEEG.shape[2], inEEG.shape[1]))
    elif len(inEEG.shape) != 1 and len(inEEG.shape) != 2:
        # fail case
        return -1
    
    # Then, initialize a probability vector of electrode length
    probvec = np.zeros(electrodes)
    
    # iterate through electrodes and get joint probs
    for i in range(0, electrodes):
        # get prob distribution
        probdist = probfunc(inEEG[:, i], 'gaussian')
        # using probdist find joint prob
        probvec[i] = np.sum(probdist)
    
    # normalize probvec
    # first calc mean
    avg = np.mean(probvec)
    # then st, d dev
    stddev = np.std(probvec)
    # then figure out which electrodes are bad
    badelec = []
    #print probvec
    for i in range(0, len(probvec)):
        #print i, avg, stddev, (avg - probvec[i]) / stddev
        if ((avg - probvec[i]) / stddev) >= threshold:
            badelec.append(i)
            
    return badelec

Kurtosis Bad Electrode Detection:

In [22]:
from scipy.stats import kurtosis
In [23]:
def kurt_baddetec(inEEG, threshold):
    electrodes = inEEG.shape[1]
    
    # Start by reshaping data (if necessary)
    if len(inEEG.shape) == 3:
        inEEG = np.reshape(inEEG, (inEEG.shape[0] * inEEG.shape[2], inEEG.shape[1]))
    elif len(inEEG.shape) != 1 and len(inEEG.shape) != 2:
        # fail case
        return -1
    
    # Then, initialize a probability vector of electrode length
    kurtvec = np.zeros(electrodes)
    
    # iterate through electrodes and get kurtoses
    for i in range(0, electrodes):
        # add kurtosis to the vector
        kurtvec[i] = kurtosis(inEEG[:, i])
        
    
    # normalize kurtvec
    # first calc mean
    avg = np.mean(kurtvec)
    # then std dev
    stddev = np.std(kurtvec)
    # then figure out which electrodes are bad
    badelec = []
    #print probvec
    for i in range(0, len(kurtvec)):
        #print i, avg, stddev, (avg - kurtvec[i]) / stddev
        if ((avg - kurtvec[i]) / stddev) >= threshold:
            badelec.append(i)
            
    return badelec

Spectral Bad Electrode Detection:

In [24]:
def spec_baddetec(inEEG, posthresh, negthresh):
    electrodes = inEEG.shape[1]
    
    # Start by reshaping data (if necessary)
    if len(inEEG.shape) == 3:
        inEEG = np.reshape(inEEG, (inEEG.shape[0] * inEEG.shape[2], inEEG.shape[1]))
    elif len(inEEG.shape) != 1 and len(inEEG.shape) != 2:
        # fail case
        return -1
    
    # initialize badelec as an empty array
    badelec = []
    
    # iterate through electrodes and get spectral densities
    for i in range(0, electrodes):
        # get frequency spectrum for electrode
        sp = np.fft.fft(inEEG[:, i]).real
        sp = sp - np.mean(sp)
        for power in sp:
            if power > posthresh or power < negthresh:
                badelec.append(i)
                break
        
    return badelec
In [25]:
def good_elec(inEEG, badelec):
    return np.delete(inEEG, badelec, 1)

Step 8: Write Qualitative Evaluation Code

Simply replot signals minus the bad electrodes.

In [26]:
def qual_eval(badelec, expected):
    print "Expected:", expected
    print "Actual:", badelec

def qual_plot(data, title):
    # Setup plotly data
    datasets =[]
    for i in range(0, data.shape[1]):
        datasets.append(Scatter(
            x = times,
            y = data[:,i],
            name = 'Sample ' + str(i)
        ))

    data = [datasets[:]]

    # Setup layout

    layout = dict(title = title,
                  xaxis = dict(title = 'Time'),
                  yaxis = dict(title = 'Unit'),
                  )

    # Make figure object

    fig = dict(data=datasets, layout=layout)

    iplot(fig)

Step 9: Write Quantitative Evaluation Code

Just check the correct number of bad electrodes were removed and which ones were removed

In [27]:
def quant_eval(badelec, expected):
    if set(x) == set(y):
        return true
    else:
        return false

Step 10: Run Qualitative Code

In [28]:
#dummybad_prob = prob_baddetec(success3, 3, kdewrap)
#dummybad_kurt = kurt_baddetec(success3, 3)
#dummybad_spec = spec_baddetec(success3, 10, -50)
In [1]:
#print dummybad_prob
#print dummybad_kurt
#print dummybad_spec
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-67f85c0fb2f5> in <module>()
----> 1 print dummybad_prob
      2 print dummybad_kurt
      3 print dummybad_spec

NameError: name 'dummybad_prob' is not defined
In [30]:
# First run on all datasets
s1probbad = prob_baddetec(success1, 3, kdewrap)
s2probbad = prob_baddetec(success2, 3, kdewrap)
s3probbad = prob_baddetec(success3, 3, kdewrap)
h1pobbad = prob_baddetec(hope1, 3, kdewrap)
f1probbad = prob_baddetec(fail1, 3, kdewrap)
In [31]:
s1bad = set(s1probbad)
s1kurtbad = kurt_baddetec(success1, 3)
s1bad.update(s1kurtbad)
s1specbad = spec_baddetec(success1, 10, -50)
s1bad.update(s1specbad)
print s1probbad
print s1kurtbad
print s1specbad
print s1bad
[]
[]
[]
set([])
In [32]:
s2bad = set(s2probbad)
s2kurtbad = kurt_baddetec(success2, 3)
s2bad.update(s2kurtbad)
s2specbad = spec_baddetec(success2, 10, -50)
s2bad.update(s2specbad)
print s2probbad
print s2kurtbad
print s2specbad
print s2bad
[4]
[]
[4]
set([4])
In [33]:
s3bad = set(s3probbad)
s3kurtbad = kurt_baddetec(success3, 3)
s3bad.update(s3kurtbad)
s3specbad = spec_baddetec(success3, 10, -50)
s3bad.update(s3specbad)
print s2probbad
print s2kurtbad
print s2specbad
print s2bad
[4]
[]
[4]
set([4])
In [47]:
h1bad = set(h1pobbad)
h1kurtbad = kurt_baddetec(hope1, 3)
h1bad.update(h1kurtbad)
h1specbad = spec_baddetec(hope1, 10, -50)
h1bad.update(h1specbad)
print h1pobbad
print h1kurtbad
print h1specbad
print h1bad
[]
[]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39])
In [38]:
f1bad = set(f1probbad)
f1kurtbad = kurt_baddetec(fail1, 3)
f1bad.update(f1kurtbad)
f1specbad = spec_baddetec(fail1, 10, -50)
f1bad.update(f1specbad)
print f1probbad
print f1kurtbad
print f1specbad
print f1bad
[]
[]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39])
In [39]:
qual_eval(list(s1bad), [])
qual_plot(good_elec(success1, list(s1bad)), "Success 1")
Expected: []
Actual: []
In [40]:
qual_eval(list(s2bad), [4])
qual_plot(good_elec(success2, list(s2bad)), "Success 2")
Expected: [4]
Actual: [4]
In [41]:
qual_eval(list(s3bad), [49])
qual_plot(good_elec(success3, list(s3bad)), "Success 3")
Expected: [49]
Actual: [49]
In [42]:
# just with prob baddetec 2.5 [32, 33, 37, 38]
qual_eval(list(h1bad), range(32,40))
qual_plot(good_elec(hope1, list(h1bad)), "Hope 1")
Expected: [32, 33, 34, 35, 36, 37, 38, 39]
Actual: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
---------------------------------------------------------------------------
PlotlyEmptyDataError                      Traceback (most recent call last)
<ipython-input-42-420b53768a38> in <module>()
      1 # just with prob baddetec 2.5 [32, 33, 37, 38]
      2 qual_eval(list(h1bad), range(32,40))
----> 3 qual_plot(good_elec(hope1, list(h1bad)), "Hope 1")

<ipython-input-26-c09cfe8ef8ea> in qual_plot(data, title)
     26     fig = dict(data=datasets, layout=layout)
     27 
---> 28     iplot(fig)

C:\Users\Nitin\Anaconda2\lib\site-packages\plotly\offline\offline.pyc in iplot(figure_or_data, show_link, link_text, validate, image, filename, image_width, image_height)
    291     plot_html, plotdivid, width, height = _plot_html(
    292         figure_or_data, show_link, link_text, validate,
--> 293         '100%', 525, global_requirejs=True)
    294 
    295     display(HTML(plot_html))

C:\Users\Nitin\Anaconda2\lib\site-packages\plotly\offline\offline.pyc in _plot_html(figure_or_data, show_link, link_text, validate, default_width, default_height, global_requirejs)
    164                default_width, default_height, global_requirejs):
    165 
--> 166     figure = tools.return_figure_from_figure_or_data(figure_or_data, validate)
    167 
    168     width = figure.get('layout', {}).get('width', default_width)

C:\Users\Nitin\Anaconda2\lib\site-packages\plotly\tools.pyc in return_figure_from_figure_or_data(figure_or_data, validate_figure)
   1447         if not figure['data']:
   1448             raise exceptions.PlotlyEmptyDataError(
-> 1449                 "Empty data list found. Make sure that you populated the "
   1450                 "list of data objects you're sending and try again.\n"
   1451                 "Questions? support@plot.ly"

PlotlyEmptyDataError: Empty data list found. Make sure that you populated the list of data objects you're sending and try again.
Questions? support@plot.ly
In [43]:
qual_eval(list(f1bad), [])
qual_plot(good_elec(fail1, list(f1bad)), "Fail 1")
Expected: []
Actual: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]

Step 11: Run Quantitative Code

Honestly, anything being gotten out of quanitative eval was gotten earlier


Step 12: Anlayzing Evaluations

Unfortunately, what I expected to happen didn't exactly happen. At 3 standard deviations away (which is a standard lent by what the researchers used in their own probability implementation), none of the noisy data was there. Attempting to modify the noise inputted into the data by changing the scale of which the noise was being added (ie changing stddev from 1 -> 2 -> 3) had literally no effect on the outcome. What makes sense is that maybe because the number of white noise values being used is so high, they accurately depict a normal graph, and if they depict a normal graph, the distribution that is eventually modelled by the kernel probability is essentially a normal graph. Not sure best way to deal with this problem (or should I fix simulations?) but will ask Jovo for help before moving more

In [ ]: